add rust support by SchrodingerZhu · Pull Request #113 · microsoft/snmalloc

SchrodingerZhu · 2020-01-20T13:04:14Z

This pr tracks the work of adding a static linking target for rust support.

My suggestion is that the merging should be considered after the discussions in #109 and rust-lang/rust#68381 are finished.

msftclas · 2020-01-20T13:04:28Z

All CLA requirements met.

CMakeLists.txt

mjp41 · 2020-01-20T15:29:38Z

Also, I think it would make sense to move aligned_size into src/mem/sizeclass.h, then we don't need the code duplication:

  inline static size_t aligned_size(size_t alignment, size_t size)
  {
   ...
  }

Should work. Then you can remove it from malloc.cc and rust.cc.

SchrodingerZhu · 2020-01-21T00:35:56Z

I do not quite understand why cxx17 features are not available even with /std:c++17 set.

SchrodingerZhu · 2020-01-21T00:36:46Z

CMakeLists.txt

+    add_library(${name} ${type} ${ARGN})
+    target_link_libraries(${name} snmalloc_lib)
+    if(MSVC)
+      target_compile_definitions(${name} PRIVATE "SNMALLOC_EXPORT=__declspec(dllexport)")


Perhaps this is not actually needed, but have it added anyway.

I am not sure what is required to link this into Rust. As this is a static lib, I would have said it isn't needed. @snf any thoughts?

Pretty sure it's not required for a static lib

Could you drop this bit please, and I'll merge.

@mjp41 I went to beg by then. Sorry for the late reply.

mjp41 · 2020-01-21T10:29:51Z

So I don't know why, but disabling the line

    # Ensure that we do not link against C++ stdlib when compiling shims.
    set_target_properties(${name} PROPERTIES LINKER_LANGUAGE C)

for MSVC, fixes the C++17 compilation. Changing to

    # Ensure that we do not link against C++ stdlib when compiling shims.
    if(NOT MSVC)
      set_target_properties(${name} PROPERTIES LINKER_LANGUAGE C)
    endif()

should fix your build issues at least.

src/mem/sizeclass.h

SchrodingerZhu · 2020-01-21T12:55:14Z

with this as an experiment:

  SNMALLOC_FAST_PATH static size_t aligned_size(size_t alignment, size_t size)
  {
    // Client responsible for checking alignment is not zero
    assert(alignment != 0);
    // Client responsible for checking alignment is not above SUPERSLAB_SIZE
    assert(alignment <= SUPERSLAB_SIZE);
    // Client responsible for checking alignment is a power of two
    assert(bits::next_pow2(alignment) == alignment);

    size = bits::max(size, alignment);
    snmalloc::sizeclass_t sc = size_to_sizeclass(size);
    if (sc >= NUM_SIZECLASSES)
    {
      // large allocs are 16M aligned, which is maximum we guarantee
      return 1;
    }
    size_t t = 2;
    for (; sc < NUM_SIZECLASSES; sc++, t++)
    {
      size = sizeclass_to_size(sc);
      if ((size & (~size + 1)) >= alignment)
      {
        return t;
      }
    }
    // Give max alignment.
    return t;
  }

and

#include <iostream>
#include "src/snmalloc.h"
#include <cstdlib>
#define LIMIT 10000000
int main() {
    size_t all = 0;
    size_t max = 0;
    srand(time(NULL));
    for(size_t i = 0; i < LIMIT; ++i) {
        size_t a = snmalloc::bits::next_pow2(1 + (size_t)rand() % (snmalloc::SUPERSLAB_SIZE));
        size_t m = rand();
        auto t = snmalloc::aligned_size(a, m);
        all += t;
        max = std::max(max, t);
    }
    std::cout << "ave: " << (double)all / (double)LIMIT << ", max: " << max << std::endl;
}

I can get:

➜  cmake-build-debug git:(master) ✗ for i in $(seq 1 10); do ./test2; done
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00569, max: 5
ave: 1.00569, max: 5
ave: 1.00569, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5

So perhaps it is acceptable as the average case is quite good?

mjp41 · 2020-01-21T13:46:05Z

Not sure I follow. Are you trying to work out: how much we round up the size to account for alignment?

Should the following line subtract, m?

auto t = snmalloc::aligned_size(a, m);

Also, if a is large and m is small, then you can't be aligned, and not smaller than alignment, so perhaps

auto t = snmalloc::aligned_size(a, m) - std::max(a,m);

SchrodingerZhu · 2020-01-21T14:00:49Z

I changed the fake aligned_size function above to return the number of trials to get the aligned size.
The average times of trials are shown above.

In most cases, the memory size should be smaller. Hence, I test again with

size_t m = rand() % snmalloc::size_to_sizeclass(snmalloc::NUM_SIZECLASSES);

➜  cmake-build-debug git:(master) ✗ for i in $(seq 1 10); do ./test2; done
ave: 1.49993, max: 2
ave: 1.4999, max: 2
ave: 1.4999, max: 2
ave: 1.49986, max: 2
ave: 1.49986, max: 2
ave: 1.49981, max: 2
ave: 1.49981, max: 2
ave: 1.5001, max: 2
ave: 1.5001, max: 2
ave: 1.49982, max: 2

mjp41 · 2020-01-21T15:34:37Z

It is more the branches on the fast path that worry me. So looking at the disasm from Clang:

0000000000000000 <rust_alloc>:
   0:	48 89 f8             	mov    %rdi,%rax
   3:	64 48 8b 3c 25 00 00 	mov    %fs:0x0,%rdi
   a:	00 00 
   c:	48 39 c6             	cmp    %rax,%rsi
   f:	48 0f 46 f0          	cmovbe %rax,%rsi
  13:	48 8d 4e ff          	lea    -0x1(%rsi),%rcx
  17:	48 89 ca             	mov    %rcx,%rdx
  1a:	48 81 f9 ff ff 00 00 	cmp    $0xffff,%rcx
  21:	77 13                	ja     36 <rust_alloc+0x36>
  23:	48 83 e2 f8          	and    $0xfffffffffffffff8,%rdx
  27:	48 8b 92 00 00 00 00 	mov    0x0(%rdx),%rdx
  2e:	48 83 fa 4a          	cmp    $0x4a,%rdx
  32:	76 3c                	jbe    70 <rust_alloc+0x70>
  34:	eb 69                	jmp    9f <rust_alloc+0x9f>
  36:	48 83 ca 20          	or     $0x20,%rdx
  3a:	f3 4c 0f bd c2       	lzcnt  %rdx,%r8
  3f:	41 b9 3a 00 00 00    	mov    $0x3a,%r9d
  45:	45 31 d2             	xor    %r10d,%r10d
  48:	4d 29 c1             	sub    %r8,%r9
  4b:	41 0f 95 c2          	setne  %r10b
  4f:	44 89 ca             	mov    %r9d,%edx
  52:	44 29 d2             	sub    %r10d,%edx
  55:	80 c2 04             	add    $0x4,%dl
  58:	c4 e2 eb f7 d1       	shrx   %rdx,%rcx,%rdx
  5d:	83 e2 03             	and    $0x3,%edx
  60:	4a 8d 14 8a          	lea    (%rdx,%r9,4),%rdx
  64:	48 83 fa 4a          	cmp    $0x4a,%rdx
  68:	77 35                	ja     9f <rust_alloc+0x9f>
  6a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  70:	89 d1                	mov    %edx,%ecx
  72:	83 e1 7f             	and    $0x7f,%ecx
  75:	48 8b 34 cd 00 00 00 	mov    0x0(,%rcx,8),%rsi
  7c:	00 
  7d:	c4 e2 f0 f3 de       	blsi   %rsi,%rcx
  82:	48 39 c1             	cmp    %rax,%rcx
  85:	73 14                	jae    9b <rust_alloc+0x9b>
  87:	48 83 c2 01          	add    $0x1,%rdx
  8b:	48 83 fa 4a          	cmp    $0x4a,%rdx
  8f:	76 df                	jbe    70 <rust_alloc+0x70>
  91:	be 00 00 00 01       	mov    $0x1000000,%esi
  96:	e9 00 00 00 00       	jmpq   9b <rust_alloc+0x9b>
  9b:	48 8d 4e ff          	lea    -0x1(%rsi),%rcx
  9f:	48 81 f9 ff ff 00 00 	cmp    $0xffff,%rcx
  a6:	77 ee                	ja     96 <rust_alloc+0x96>
  a8:	48 83 e1 f8          	and    $0xfffffffffffffff8,%rcx
  ac:	48 8b b1 00 00 00 00 	mov    0x0(%rcx),%rsi
  b3:	48 8b 04 f7          	mov    (%rdi,%rsi,8),%rax
  b7:	48 85 c0             	test   %rax,%rax
  ba:	74 08                	je     c4 <rust_alloc+0xc4>
  bc:	48 8b 08             	mov    (%rax),%rcx
  bf:	48 89 0c f7          	mov    %rcx,(%rdi,%rsi,8)
  c3:	c3                   	retq   
  c4:	e9 00 00 00 00       	jmpq   c9 <rust_alloc+0xc9>
  c9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)

where as

0000000000003b60 <malloc>:
    3b60:	48 89 fe             	mov    %rdi,%rsi
    3b63:	48 8b 05 66 44 22 00 	mov    0x224466(%rip),%rax        # 227fd0 <.got+0x8>
    3b6a:	64 48 8b 38          	mov    %fs:(%rax),%rdi
    3b6e:	48 8d 46 ff          	lea    -0x1(%rsi),%rax
    3b72:	48 3d ff ff 00 00    	cmp    $0xffff,%rax
    3b78:	77 20                	ja     3b9a <malloc+0x3a>
    3b7a:	48 83 e0 f8          	and    $0xfffffffffffffff8,%rax
    3b7e:	48 8d 0d 8b 1e 01 00 	lea    0x11e8b(%rip),%rcx        # 15a10 <_ZN8snmallocL18sizeclass_metadataE>
    3b85:	48 8b 34 08          	mov    (%rax,%rcx,1),%rsi
    3b89:	48 8b 04 f7          	mov    (%rdi,%rsi,8),%rax
    3b8d:	48 85 c0             	test   %rax,%rax
    3b90:	74 0d                	je     3b9f <malloc+0x3f>
    3b92:	48 8b 08             	mov    (%rax),%rcx
    3b95:	48 89 0c f7          	mov    %rcx,(%rdi,%rsi,8)
    3b99:	c3                   	retq   
    3b9a:	e9 21 f2 ff ff       	jmpq   2dc0 <_ZN8snmalloc9AllocatorINS_24MemoryProviderStateMixinINS_8PALLinuxEEENS_15DefaultChunkMapINS_21GlobalPagemapTemplateINS_11FlatPagemapILm24EhEEEEEELb1EXadL_ZNS_16lazy_replacementEPvEEE15alloc_not_smallILNS_7ZeroMemE0ELNS_12AllowReserveE1EEESA_m>
    3b9f:	e9 7c f4 ff ff       	jmpq   3020 <_ZN8snmalloc9AllocatorINS_24MemoryProviderStateMixinINS_8PALLinuxEEENS_15DefaultChunkMapINS_21GlobalPagemapTemplateINS_11FlatPagemapILm24EhEEEEEELb1EXadL_ZNS_16lazy_replacementEPvEEE16small_alloc_slowILNS_7ZeroMemE0ELNS_12AllowReserveE1EEESA_m>
    3ba4:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
    3bab:	00 00 00 
    3bae:	66 90                	xchg   %ax,%ax

This looks quite a bit slower potentially at the moment.

mjp41 · 2020-01-21T15:56:41Z

Okay, I have done a bit of maths, I think that we can simplify the whole piece of code to

SNMALLOC_FAST_PATH static size_t aligned_size(size_t alignment, size_t size)
{
    // Client responsible for checking alignment is not zero
    assert(alignment != 0);
    // Client responsible for checking alignment is not above SUPERSLAB_SIZE
    assert(alignment <= SUPERSLAB_SIZE);
    // Client responsible for checking alignment is a power of two
    assert(bits::next_pow2(alignment) == alignment);

    return ((alignment - 1) | (size - 1)) + 1;
}

I think this should interact correctly with the sizeclass calculation. It would be great to add a validation test that this does the correct thing. Something similar to
https://github.com/microsoft/snmalloc/blob/master/src/test/func/sizeclass/sizeclass.cc
Where we scan the ranges and check the maths works on every CI push. Something like

   sizeclass_to_size(size_to_sizeclass(aligned_size(alignment, size))) % alignment == 0

which should work except for 0.

SchrodingerZhu · 2020-01-22T01:01:36Z

Cool!
Do you mean sth like this:

#include <iostream>
#include <random>
#include <snmalloc.h>
#include <test/setup.h>
NOINLINE
snmalloc::sizeclass_t size_to_sizeclass(size_t size)
{
  return snmalloc::size_to_sizeclass(size);
}

NOINLINE
size_t sizeclass_to_size(snmalloc::sizeclass_t sc)
{
  return snmalloc::sizeclass_to_size(sc);
}

NOINLINE
size_t aligned_size(size_t alignment, size_t size)
{
  return snmalloc::aligned_size(alignment, size);
}

int main(int, char**)
{
  setup();

  bool failed = false;

  std::random_device dev;
  unsigned int seed = dev();
  std::cout << "random seed: " << seed << std::endl;

  std::uniform_int_distribution<size_t> dist(1);
  std::mt19937 rng(seed);

  for (snmalloc::sizeclass_t sz = 0; sz < snmalloc::NUM_SIZECLASSES; sz++)
  {
    // Separate printing for small and medium sizeclasses
    if (sz == snmalloc::NUM_SMALL_CLASSES)
      std::cout << std::endl;

    size_t this_size = snmalloc::sizeclass_to_size(sz),
           pre_size = (sz != 0) ? snmalloc::sizeclass_to_size(sz - 1) : 1;

    for (size_t alignment = snmalloc::bits::next_pow2(pre_size);
         alignment < this_size;
         alignment = snmalloc::bits::next_pow2(alignment + 1))
    {
      std::cout << std::endl;
      std::cout << "-- alignment: " << alignment << std::endl;
      for (int i = 0; i < 100; ++i)
      {
        size_t size = dist(rng) % alignment;
        std::cout << "---- size: " << size;
        bool flag = sizeclass_to_size(size_to_sizeclass(aligned_size(alignment, size))) %
            alignment !=
            0;
        std::cout << ((flag) ? " [FAILED]" : " [SUCCESS]") << std::endl;
        failed |= flag;
      }
    }
  }

  if (failed)
    abort();
}

mjp41 · 2020-01-22T09:43:35Z

@SchrodingerZhu, I would prefer something a bit more exhaustive. I think the following covers the core properties of the sizeclass calculation, and aligned_size, that mean this is correct.

void test_align_size()
{
  bool failed = false;

  assert(aligned_size(128, 160) == 256);

  for (size_t size = 1; size < sizeclass_to_size(NUM_SIZECLASSES - 1); size++)
  {
    size_t rsize = sizeclass_to_size(size_to_sizeclass(size));

    if (rsize < size)
    {
      std::cout << "Size class rounding shrunk: " << size << " -> " << rsize
                << std::endl;
      failed |= true;
    }

    auto lsb_rsize = rsize & -rsize;
    auto lsb_size = size & -size;
    if (lsb_rsize < lsb_size)
    {
      std::cout << "Original size more aligned than rounded size: " << size
                << " (" << lsb_size << ") -> " << rsize << " (" << lsb_rsize
                << ")" << std::endl;
      failed |= true;
    }

    for (size_t alignment_bits = 0; alignment_bits < SUPERSLAB_BITS;
         alignment_bits++)
    {
      auto alignment = (size_t)1 << alignment_bits;
      auto asize = aligned_size(alignment, size);

      if (asize < size)
      {
        std::cout << "Shrunk! Alignment: " << alignment << " Size: " << size
                  << " ASize: " << asize << std::endl;
        failed |= true;
      }

      if ((asize & (alignment - 1)) != 0)
      {
        std::cout << "Not aligned! Alignment: " << alignment
                  << " Size: " << size << " ASize: " << asize << std::endl;
        failed |= true;
      }
    }
  }

  if (failed)
    abort();
}

In particular, the sizeclass rounding, should never set lower bits, and the alignment rounding should
also never set lower bits. As the layout of the size classes ensures every allocation is aligned upto its LSB, then we should be good.

mjp41 · 2020-01-22T11:42:41Z

The code gen for this change is really nice, clang gives:

0000000000000000 <rust_alloc>:
   0:	48 89 f8             	mov    %rdi,%rax
   3:	64 48 8b 3c 25 00 00 	mov    %fs:0x0,%rdi
   a:	00 00 
   c:	48 83 c0 ff          	add    $0xffffffffffffffff,%rax
  10:	48 83 c6 ff          	add    $0xffffffffffffffff,%rsi
  14:	48 09 c6             	or     %rax,%rsi
  17:	48 81 fe ff ff 00 00 	cmp    $0xffff,%rsi
  1e:	77 1c                	ja     3c <rust_alloc+0x3c>
  20:	48 83 e6 f8          	and    $0xfffffffffffffff8,%rsi
  24:	48 8b b6 00 00 00 00 	mov    0x0(%rsi),%rsi
  2b:	48 8b 04 f7          	mov    (%rdi,%rsi,8),%rax
  2f:	48 85 c0             	test   %rax,%rax
  32:	74 11                	je     45 <rust_alloc+0x45>
  34:	48 8b 08             	mov    (%rax),%rcx
  37:	48 89 0c f7          	mov    %rcx,(%rdi,%rsi,8)
  3b:	c3                   	retq   
  3c:	48 83 c6 01          	add    $0x1,%rsi
  40:	e9 00 00 00 00       	jmpq   45 <rust_alloc+0x45>
  45:	e9 00 00 00 00       	jmpq   4a <rust_alloc+0x4a>
  4a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)

I think this will be pretty fast.

Co-authored-by: mjp41 <mattpark@microsoft.com>

mjp41

I think you need a check for zero alignment in memalign inside malloc.cc as our libc tests expect align 0 to work.

Great, work. Thanks

mjp41 · 2020-01-22T13:33:18Z

CMakeLists.txt

+    add_library(${name} ${type} ${ARGN})
+    target_link_libraries(${name} snmalloc_lib)
+    if(MSVC)
+      target_compile_definitions(${name} PRIVATE "SNMALLOC_EXPORT=__declspec(dllexport)")


I am not sure what is required to link this into Rust. As this is a static lib, I would have said it isn't needed. @snf any thoughts?

src/override/rust.cc

SchrodingerZhu · 2020-01-22T14:09:44Z

The function posix_memalign() allocates size bytes and places the address of the allocated memory > in *memptr. The address of the allocated memory will be a multiple of alignment, which must be a > > power of two and a multiple of sizeof(void *). If size is 0, then posix_memalign() returns either NULL, > or a unique pointer value that can later be successfully passed to free(3).

The obsolete function memalign() allocates size bytes and returns a pointer to the allocated
memory. The memory address will be a multiple of alignment, which must be a power of two.

Are these requirements treated correctly in current method?

mjp41 · 2020-01-22T14:15:56Z

Are these requirements treated correctly in current method?

I believe so. Well, as much as they were before. We don't meet the posix_memalign spec when alignment > SUPERSLAB_SIZE. I intend to fix this later.

mjp41 · 2020-01-22T14:18:28Z

I noticed the codegen for dealloc where a size is supplied is not as good as it could be. I'll submit a separate PR refactoring that.

mjp41 · 2020-01-22T15:13:48Z

Unless you want to add something more, I'm happy to merge this when it passes CI.

SchrodingerZhu · 2020-01-22T15:27:59Z

I think it is ready then.

add rust support

a4bb49a

SchrodingerZhu requested a review from mjp41 January 20, 2020 13:04

adjust code style

f262dea

mjp41 requested changes Jan 20, 2020

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

SchrodingerZhu added 3 commits January 20, 2020 23:58

move aligned_size to sizeclass.h

e2c4a0c

add static qualifier

d3a3038

adjust CMakeLists.txt, may broke CI tests

81fc1fe

SchrodingerZhu commented Jan 21, 2020

View reviewed changes

mjp41 requested changes Jan 21, 2020

View reviewed changes

src/mem/sizeclass.h Outdated Show resolved Hide resolved

SchrodingerZhu added 2 commits January 21, 2020 20:16

fix msvc's complaining on c++17

a7d88ca

use SNMALLOC_FAST_PATH as the decorator of aligned_size

8425b05

SchrodingerZhu and others added 2 commits January 22, 2020 21:26

adapt new alignment algorithm and add related test

756fc29

Co-authored-by: mjp41 <mattpark@microsoft.com>

fix test cases for msvc

4b6b385

mjp41 requested changes Jan 22, 2020

View reviewed changes

SchrodingerZhu added 2 commits January 22, 2020 22:03

add extra test for size == 0

27291e1

treat memory block of same sizeclass as the same

85544d1

fix formatting problem

a358eff

This was referenced Jan 22, 2020

snmalloc is out-of-date microsoft/verona#59

Closed

Adding Rust Support #109

Closed

remove extra declarations

2b2ab91

mjp41 approved these changes Jan 23, 2020

View reviewed changes

mjp41 merged commit 8304ded into microsoft:master Jan 23, 2020

Conversation

SchrodingerZhu commented Jan 20, 2020

Uh oh!

msftclas commented Jan 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mjp41 commented Jan 20, 2020

Uh oh!

SchrodingerZhu commented Jan 21, 2020

Uh oh!

SchrodingerZhu Jan 21, 2020

Choose a reason for hiding this comment

Uh oh!

mjp41 Jan 22, 2020

Choose a reason for hiding this comment

Uh oh!

snf Jan 22, 2020

Choose a reason for hiding this comment

Uh oh!

mjp41 Jan 22, 2020

Choose a reason for hiding this comment

Uh oh!

SchrodingerZhu Jan 23, 2020

Choose a reason for hiding this comment

Uh oh!

mjp41 commented Jan 21, 2020

Uh oh!

Uh oh!

SchrodingerZhu commented Jan 21, 2020

Uh oh!

mjp41 commented Jan 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SchrodingerZhu commented Jan 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjp41 commented Jan 21, 2020

Uh oh!

mjp41 commented Jan 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SchrodingerZhu commented Jan 22, 2020

Uh oh!

mjp41 commented Jan 22, 2020

Uh oh!

mjp41 commented Jan 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjp41 left a comment

Choose a reason for hiding this comment

Uh oh!

mjp41 Jan 22, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SchrodingerZhu commented Jan 22, 2020

Uh oh!

mjp41 commented Jan 22, 2020

Uh oh!

mjp41 commented Jan 22, 2020

Uh oh!

mjp41 commented Jan 22, 2020

Uh oh!

SchrodingerZhu commented Jan 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

msftclas commented Jan 20, 2020 •

edited

Loading

mjp41 commented Jan 21, 2020 •

edited

Loading

SchrodingerZhu commented Jan 21, 2020 •

edited

Loading

mjp41 commented Jan 21, 2020 •

edited

Loading

mjp41 commented Jan 22, 2020 •

edited

Loading